Week 6: Visualizing Uncertainty

Emorie D Beck

Piecing Plots Together

Packages

# | code-line-numbers: "11-12"
library(RColorBrewer)
library(knitr)
library(kableExtra)
library(plyr)
library(broom)
library(modelr)
library(lme4)
library(broom.mixed)
library(tidyverse)
library(ggdist)
library(cowplot)
library(ggExtra)
library(distributional)
library(gganimate)

Custom Theme:

my_theme <- function(){
  theme_classic() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

Review

  • Over the last several weeks, we have talked about:
    • tidying data
    • ggplot2 logic
    • visualizing proportions
    • visualizing differences
    • visualizing time series
    • visualizing uncertainty
  • For the rest of the course, we will pivot to taking everything we’ve learning and piecing it all together
    • Today: Piecing visualizations together
    • Next week: Polishing visualizations **
    • 11/21: Interactive Visualizations (shiny)

Today

  • There are lots of packages for piecing visualizations together
  • I have used lots and the only one that I can say I’ve actually liked in cowplot, so I’m going to teach you that
  • There are other more specialized packages worth mentioning

Today

ggextra

  • We’ll start with ggextra because it will help us create plots with distributions in the margins.
  • After, we’ll move to cowplot, where there will be lots of little odds and ends to step through

cowplot

  • Why cowplot?
    • figure alignment
    • easier to choose relative values and layouts
    • can mix base R plots and ggplot2 plots
    • allows you to annotate plots (including stacking, as opposed to layering)
    • shared legends!
    • includes the themes from his book

cowplot

Let me show you a couple of examples from my work that has used cowplot

plot_grid()

  • The core function of cowplot is plot_grid(), which allows us to place differnt figures within the same figure in a grid, and it has a lot of useful arguments
  • plotlist = NULL
  • align = c("none", "h", "v", "hv")
  • axis = c("none", "l", "r", "t", "b", "lr", "tb", "tblr")
  • nrow = NULL
  • ncol = NULL
  • rel_widths = 1
  • rel_heights = 1
  • labels = NULL
  • label_size = 14
  • label_fontfamily = NULL
  • label_fontface = "bold"
  • label_colour = NULL
  • label_x = 0
  • label_y = 1
  • hjust = -0.5
  • vjust = 1.5
  • scale = 1
  • greedy = TRUE
  • byrow = TRUE
  • cols = NULL
  • rows = NULL

plot_grid()

  • Let’s build up our use cases incrementally!
  • But first, we need some plots to plot!
  • Remember these data?
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/04-week4-associations/04-data/week4-data.RData?raw=true"))
pred_data
# A tibble: 5,021 × 25
   study  o_value p_year SID     p_value     age gender grsWages parEdu race 
   <chr>  <fct>    <dbl> <chr>     <dbl>   <dbl> <fct>     <dbl> <fct>  <fct>
 1 Study1 0         2005 61215      6.67 -29.9   1         1.02  2      0    
 2 Study1 0         2005 184965     0    -22.9   0         1.14  2      0    
 3 Study1 0         2005 488251    10     -3.92  1         0.717 1      0    
 4 Study1 0         2005 650779     7.22 -25.9   1         0.644 3      0    
 5 Study1 0         2005 969691     7.22  -0.925 1         0.812 2      0    
 6 Study1 0         2005 986687     6.11  14.1   0         1.76  2      0    
 7 Study1 0         2005 1054011    5.56   8.08  0         1.34  1      0    
 8 Study1 0         2005 1372251    7.78   5.08  1         0.842 1      0    
 9 Study1 0         2005 1496703    6.11 -23.9   0         1.42  2      0    
10 Study1 0         2005 1897887    2.78  38.1   1         0.725 2      0    
# … with 5,011 more rows, and 15 more variables: physhlthevnt <fct>,
#   SRhealth <dbl>, smokes <fct>, alcohol <fct>, exercise <dbl>, BMI <dbl>,
#   parDivorce <fct>, PhysFunc <fct>, religion <fct>, education <fct>,
#   married <fct>, numKids <dbl>, parOccPrstg <dbl>, reliability <dbl>,
#   predInt <dbl>

Example Setup

And remember these models?

tidy_ci <- function(m) tidy(m, conf.int = T)

nested_m <- pred_data %>%
  group_by(study) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    m = map(data
            , ~glm(
              o_value ~ p_value
              , data = .
              , family = binomial(link = "logit")
              )
            )
    , tidy = map(m, tidy_ci)
  )
nested_m
# A tibble: 6 × 4
  study  data                  m      tidy            
  <chr>  <list>                <list> <list>          
1 Study1 <tibble [831 × 24]>   <glm>  <tibble [2 × 7]>
2 Study2 <tibble [1,000 × 24]> <glm>  <tibble [2 × 7]>
3 Study3 <tibble [1,000 × 24]> <glm>  <tibble [2 × 7]>
4 Study4 <tibble [574 × 24]>   <glm>  <tibble [2 × 7]>
5 Study5 <tibble [616 × 24]>   <glm>  <tibble [2 × 7]>
6 Study6 <tibble [1,000 × 24]> <glm>  <tibble [2 × 7]>

Example Setup

And remember these models?
Let’s do one small change

m_fun <- function(d) {
  glm(o_value ~ p_value + married + married:p_value
      , data = d
      , family = binomial(link = "logit"))
}
tidy_ci <- function(m) tidy(m, conf.int = T) %>% mutate(df.resid = m$df.residual, n = nrow(m$data))

nested_m <- pred_data %>%
  group_by(study) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    m = map(data, m_fun)
    , tidy = map(m, tidy_ci)
  )
nested_m
# A tibble: 6 × 4
  study  data                  m      tidy            
  <chr>  <list>                <list> <list>          
1 Study1 <tibble [831 × 24]>   <glm>  <tibble [4 × 9]>
2 Study2 <tibble [1,000 × 24]> <glm>  <tibble [4 × 9]>
3 Study3 <tibble [1,000 × 24]> <glm>  <tibble [4 × 9]>
4 Study4 <tibble [574 × 24]>   <glm>  <tibble [4 × 9]>
5 Study5 <tibble [616 × 24]>   <glm>  <tibble [4 × 9]>
6 Study6 <tibble [1,000 × 24]> <glm>  <tibble [4 × 9]>

Example Setup

Here’s our unnested model terms

nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp)
# A tibble: 24 × 10
   study  term     estim…¹ std.e…² stati…³ p.value conf.…⁴ conf.…⁵ df.re…⁶     n
   <chr>  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <int> <int>
 1 Study1 (Interc…   1.33   0.391    0.727  0.467   0.618    2.87      827   831
 2 Study1 p_value    0.900  0.0572  -1.84   0.0655  0.804    1.01      827   831
 3 Study1 married1   0.926  0.524   -0.147  0.883   0.330    2.59      827   831
 4 Study1 p_value…   1.02   0.0745   0.238  0.812   0.880    1.18      827   831
 5 Study2 (Interc…   0.705  1.59    -0.220  0.826   0.0250  18.1       992  1000
 6 Study2 p_value    1.09   0.218    0.376  0.707   0.702    1.71      992  1000
 7 Study2 married1   6.40   1.62     1.14   0.253   0.237  190.        992  1000
 8 Study2 p_value…   0.758  0.221   -1.25   0.211   0.478    1.18      992  1000
 9 Study3 (Interc…   6.03   1.20     1.49   0.135   0.581   68.2       996  1000
10 Study3 p_value    0.706  0.156   -2.23   0.0256  0.514    0.952     996  1000
# … with 14 more rows, and abbreviated variable names ¹​estimate, ²​std.error,
#   ³​statistic, ⁴​conf.low, ⁵​conf.high, ⁶​df.resid

Example Setup

But maybe we are particularly interested in the interaction between marital status and personality in predicting mortality, which we want to plot as a forest plot

nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) %>%
  filter(term == "p_value:married1")
# A tibble: 6 × 10
  study  term      estim…¹ std.e…² stati…³ p.value conf.…⁴ conf.…⁵ df.re…⁶     n
  <chr>  <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <int> <int>
1 Study1 p_value:…   1.02   0.0745   0.238 0.812     0.880   1.18      827   831
2 Study2 p_value:…   0.758  0.221   -1.25  0.211     0.478   1.18      992  1000
3 Study3 p_value:…   1.21   0.161    1.17  0.242     0.886   1.68      996  1000
4 Study4 p_value:…   0.824  0.284   -0.683 0.495     0.471   1.46      570   574
5 Study5 p_value:…   0.618  0.159   -3.04  0.00239   0.449   0.838     612   616
6 Study6 p_value:…   0.957  0.109   -0.407 0.684     0.773   1.19      996  1000
# … with abbreviated variable names ¹​estimate, ²​std.error, ³​statistic,
#   ⁴​conf.low, ⁵​conf.high, ⁶​df.resid

Example Setup

  • We could hack our way to a forest plot in a single figure, but it never looks as nice as if we do it in two
    • the forest plot itself
    • the table of values

Example Setup: Forest Plot

p1 <- nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) %>%
  filter(term == "p_value:married1") %>%
  ggplot(aes(x = estimate, y = fct_rev(study))) + 
    labs(
      x = "Model Estimated OR (CI)"
      , y = NULL
      ) + 
    my_theme()
p1

Example Setup: Forest Plot

Let’s add our point estimates and uncertainty intervals

p1 + 
  stat_gradientinterval(
    aes(xdist = dist_student_t(df = df.resid, mu = estimate, sigma = std.error))
    , .width = c(.95, .99)
    , shape = "square"
  ) 

p1

Example Setup: Forest Plot

p1 <- nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) %>%
  filter(term == "p_value:married1")

p1 <- p1 %>% 
  mutate(study = factor(study, (p1 %>% arrange(desc(estimate)))$study)) %>%
  ggplot(aes(x = estimate, y = study)) + 
    labs(
      x = "Model Estimated OR (CI)"
      , y = NULL
      ) + 
    my_theme()
p1

Example Setup: Forest Plot

Let’s add our point estimates and uncertainty intervals

p1 <- p1 + 
  stat_gradientinterval(
    aes(xdist = dist_student_t(df = df.resid, mu = estimate, sigma = std.error))
    , .width = c(.95, .99)
    , shape = "square"
  ) 
p1

Example Setup: Forest Plot

p1 <- p1 + 
  geom_vline(aes(xintercept = 1), linetype = "dashed") 
p1 

Example Setup: Forest Plot Table

  • In a forest plot, we don’t just show estimates, we print them with the sample size
p2 <- nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) %>%
  filter(term == "p_value:married1")

stdy_levs <-  tibble(num = 1:6, new = (p2 %>% arrange(desc(estimate)))$study)

p2 <- p2 %>%
  arrange(desc(estimate)) %>%
  mutate(study = factor(study, stdy_levs$new)
         , study2 = 1:n()) %>%
  mutate_at(vars(estimate, conf.low, conf.high), ~sprintf("%.2f", .)) %>%
  mutate(est = sprintf("%s [%s, %s]", estimate, conf.low, conf.high)
         , n = as.character(n)) %>%
  select(study, study2, estimate, n, est) %>%
  pivot_longer(
    cols = c(est, n)
    , values_to = "lab"
    , names_to = "est"
  )
p2
# A tibble: 12 × 5
   study  study2 estimate est   lab              
   <fct>   <int> <chr>    <chr> <chr>            
 1 Study3      1 1.21     est   1.21 [0.89, 1.68]
 2 Study3      1 1.21     n     1000             
 3 Study1      2 1.02     est   1.02 [0.88, 1.18]
 4 Study1      2 1.02     n     831              
 5 Study6      3 0.96     est   0.96 [0.77, 1.19]
 6 Study6      3 0.96     n     1000             
 7 Study4      4 0.82     est   0.82 [0.47, 1.46]
 8 Study4      4 0.82     n     574              
 9 Study2      5 0.76     est   0.76 [0.48, 1.18]
10 Study2      5 0.76     n     1000             
11 Study5      6 0.62     est   0.62 [0.45, 0.84]
12 Study5      6 0.62     n     616              

Example Setup: Forest Plot Table

p2 <- p2 %>%
  ggplot(aes(x = est, y = study2)) + 
    labs(
      x = NULL
      , y = NULL
      ) + 
    my_theme()
p2

Example Setup: Forest Plot Table

p2 <- p2 + 
  geom_text(aes(label = lab))
p2

Example Setup: Forest Plot Table

p2 <- p2 + 
  theme_void()
p2

Example Setup: Forest Plot Table

p2 <- p2 + 
  geom_hline(aes(yintercept = 6.5)) + 
  theme(axis.line.x = element_line(color = "black"))
p2

Example Setup: Forest Plot Table

# "My~bold(Partly~Bold)~and~italic(Partly~Italic)~Text"
p2 <- p2 + 
  annotate("text"
           , x = "est" , y = 7
           , label = "b [CI]"
           , fontface = "bold"
           ) + 
  annotate("text"
           , x = "n", y = 7
           , label = "N"
           , fontface = "bold"
           ) 
p2

Example Setup: Forest Plot Table

p2 <- p2 + 
  scale_y_continuous(limits = c(.4,7.1))
p2

Example Setup: Back to the Forest Plot

We added an extra row at the top of the table, so we need to do that for the forest plot, too

p1 <- nested_m %>% select(study, tidy) %>%
  unnest(tidy) %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) %>%
  filter(term == "p_value:married1")

stdy_levs <-  tibble(num = 1:6, new = (p1 %>% arrange(desc(estimate)))$study)

p1 <- p1 %>%
  arrange(desc(estimate)) %>%
  mutate(study = factor(study, stdy_levs$new)
         , study2 = 1:n()) %>%
  ggplot(aes(x = estimate, y = study2)) + 
    labs(
      x = "Model Estimated OR (CI)"
      , y = NULL
      ) + 
    my_theme()
p1

Example Setup: Forest Plot

Add our point estimates and uncertainty intervals, along with the vertical line at OR = 1

p1 <- p1 + 
  stat_gradientinterval(
    aes(xdist = dist_student_t(df = df.resid, mu = estimate, sigma = std.error))
    , .width = c(.95, .99)
    , shape = "square"
  ) + 
  geom_vline(aes(xintercept = 1), linetype = "dashed") 
p1

Example Setup: Forest Plot

Change the y scale back

p1 <- p1 + 
  scale_y_continuous(limits = c(.4,7.1)
                     , breaks = seq(1,6,1)
                     , labels = stdy_levs$new)
p1 

Example Setup: Forest Plot

Add in that top bar

p1 <- p1 + 
  geom_hline(aes(yintercept = 6.5))
p1 

Example Setup: Forest Plot

Remove the y axis line

p1 <- p1 + 
  theme(axis.line.y = element_blank(), 
        axis.ticks.y = element_blank())
p1 

Example Setup: Forest Plot

And let’s block out where the dashed line touches the top:

p1 <- p1 + 
  annotate("rect"
           , xmin = -Inf
           , xmax = Inf
           , ymin = 6.51
           , ymax = Inf
           , fill = "white")
p1

plot_grid()

  • I know that was a lot, but such is the reality of ggplot – we have to hack it!
    • annotate() is a great tool for this
    • so are our scale_[map]_[type] functions, especially given the labels can be anything we want!
    • and our theme elements also let us hack many more parts!
  • The biggest trick to ggplot2 is simply having lots of tricks up your sleeve, which come from knowledge (and StackOverflow)

plot_grid()

  • But now that we have our plot, we want to put it together! Remember these?
  • plotlist = NULL
  • align = c("none", "h", "v", "hv")
  • axis = c("none", "l", "r", "t", "b", "lr", "tb", "tblr")
  • nrow = NULL
  • ncol = NULL
  • rel_widths = 1
  • rel_heights = 1
  • labels = NULL
  • label_size = 14
  • label_fontfamily = NULL
  • label_fontface = "bold"
  • label_colour = NULL
  • label_x = 0
  • label_y = 1
  • hjust = -0.5
  • vjust = 1.5
  • scale = 1
  • greedy = TRUE
  • byrow = TRUE
  • cols = NULL
  • rows = NULL

plot_grid()

  • But now that we have our plot, we want to put it together! Remember these?
  • plotlist = NULL
  • align = c("none", "h", "v", "hv")
  • axis = c("none", "l", "r", "t", "b", "lr", "tb", "tblr")
  • nrow = NULL
  • ncol = NULL
  • rel_widths = 1
  • rel_heights = 1
  • labels = NULL
  • label_size = 14
  • label_fontfamily = NULL
  • label_fontface = "bold"
  • label_colour = NULL
  • label_x = 0
  • label_y = 1
  • hjust = -0.5
  • vjust = 1.5
  • scale = 1
  • greedy = TRUE
  • byrow = TRUE
  • cols = NULL
  • rows = NULL

plot_grid()

plot_grid(
  p1, p2
)

Not bad, but we want to align our plots

plot_grid()

plot_grid(p1, p2, align = "h")

plot_grid(p1, p2, align = "v")

plot_grid(p1, p2, align = "hv")

Similar behavior, but "hv" leads to odd spacing

plot_grid()

plot_grid(p1, p2, axis = "t")

plot_grid(p1, p2, axis = "b")

plot_grid(p1, p2, axis = "tblr")

Doesn’t properly align our bottom because it’s not optimized for labels

plot_grid()

plot_grid(
  p1, p2
  , align = "h"
  , nrow = 1
  , rel_widths = c(.6, .4)
  )

Let our interval estimates shine

plot_grid()

plot_grid(
  p1, p2
  , align = "hv"
  , nrow = 2
  , rel_heights = c(.6, .4)
  )

We wouldn’t do this, but note that when we have rows, we use rel_heights

plot_grid(): Labels

plot_grid(p1, p2, align = "h", nrow = 1
          , rel_widths = c(.6, .4)
          , labels = "auto")

plot_grid(
  p1, p2, align = "h", nrow = 1
  , rel_widths = c(.6, .4)
  , labels = "AUTO")

plot_grid(): Labels

plot_grid(
  p1, p2, align = "h", nrow = 1
  , rel_widths = c(.6, .4)
  , labels = "AUTO"
  , label_size = 18 # 14 default
  , label_fontface = "bold.italic"
  , label_fontfamily = "Times"
  , label_colour = "purple" # u is sensitive
  )

plot_grid(): Labels

plot_grid(
  p1, p2, align = "h", nrow = 1
  , rel_widths = c(.6, .4)
  , labels = "AUTO"
  , label_size = 18 # 14 default
  , label_fontface = "bold.italic"
  , label_fontfamily = "Times"
  , label_colour = "purple" # u is sensitive
  , label_x = .5
  , label_y = .5
  )

plot_grid(): Labels

plot_grid(
  p1, p2, align = "h", nrow = 1
  , rel_widths = c(.6, .4)
  , labels = "AUTO"
  , label_size = 18 # 14 default
  , label_fontface = "bold.italic"
  , label_fontfamily = "Times"
  , label_colour = "purple" # u is sensitive
  , label_x = c(.1,.85)
  , label_y = c(.95,.1)
  )

plot_grid(): Labels

plot_grid(
  p1, p2, align = "h", nrow = 1
  , rel_widths = c(.6, .4)
  , labels = "AUTO"
  , label_size = 18 # 14 default
  , label_fontface = "bold.italic"
  , label_fontfamily = "Times"
  , label_colour = "purple" # u is sensitive
  , hjust = .5
  , vjust = .5
  )

plot_grid(): titles

It’d be nice if the title was centered, right?

plot_grid(
  p1 +
  labs(
    subtitle = "Conscientiousness x Marital Status"
    , title = "Mortality Odds"
    )
  , p2
  , align = "h"
  , nrow = 1
  , rel_widths = c(.6, .4)
)

plot_grid(): titles

It’d be nice if the title was centered, right?

p3 <- plot_grid(
  p1 
  , p2
  , align = "h"
  , nrow = 1
  , rel_widths = c(.6, .4)
)
p3

New grobs for drawing on our plots

  • To add the title, we need some new functions *cowplot also adds some other new tools to our repertoire:
    • ggdraw()
    • draw_label()
    • draw_plot_label()
    • draw_grob()
    • draw_image()

New grobs: ggdraw() + draw_label()

*ggdraw()is more or a setup function that allows us to add grobs on top * We'll use it withdraw_label()` to make our title (just some text to put on the plot

plot_grid(): titles

title <- ggdraw() + 
  draw_label(
    "Mortality Odds"
    , fontface = 'bold'
    , x = .5
    , hjust = .5
    , y = .8
  ) +
  draw_label(
    "Conscientiousness x Marital Status"
    , fontface = 'italic'
    , x = .5
    , hjust = .5
    , y = .2
  ) +
  theme(
    # add margin on the left of the drawing canvas,
    # so title is aligned with left edge of first plot
    plot.margin = margin(0, 0, 0, 7)
  )
title

plot_grid(): titles

p <- plot_grid(
  title
  , p3
  , nrow = 2
  , rel_heights = c(.1, .9)
)
p

New grobs for drawing on our plots

*cowplot also adds some other new tools to our repertoire: + ggdraw() + draw_label()

New grobs: draw_label()

  • draw_label() is meant to be a better wrapper for geom_text() that requires less customization
  • Say for example, we want to put a wordmark on our plots (there are journals that require this!)
  • Doing this with geom_text() would require 10+ arguments and has no easy application to figures put together with cowplot (or other packages for doing so)
ggdraw(p) + 
  draw_label("Draft", color = "grey80", size = 100, angle = 45)

New grobs: draw_plot()

  • Imagine you want to put a plot inside of another
inset <- 
  pred_data %>% 
  filter(study == "Study1") %>%
  ggplot(aes(y = gender, x = SRhealth, fill = gender)) + 
    scale_fill_manual(values = c("cornflowerblue", "coral")) + 
    scale_y_discrete(labels = c("Male", "Female")) + 
    stat_halfeye(alpha = .8) + 
    my_theme() + 
    theme(legend.position = "none") + theme_half_open(12)

p4 <- pred_data %>% 
  filter(study == "Study1") %>%
  ggplot(aes(x = p_value, SRhealth, fill = gender)) + 
    geom_point(shape = 21, color = "grey20", size = 3) + 
    scale_fill_manual(values = c("cornflowerblue", "coral"), labels = c("Male", "Female")) + 
    my_theme()

New grobs: draw_plot()

  • Imagine you want to put a plot inside of another
ggdraw(p4) + 
  draw_plot(inset, .1, .2, .6, .4)

New grobs: draw_image()

We can also add images!

p3

New grobs: draw_image()

We can also add images!

ggdraw() + 
  draw_plot(p3) + 
  draw_image(
    "https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/01-week1-intro/02-code/02-images/ucdavis_logo_blue.png"
    , x = 1, y = 0.05, hjust = 1, vjust = 1, halign = 1, valign = 1,
    width = 0.15
  )

  • draw_plot_label()
    • draw_grob()
    • draw_image()